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Abstract 



We study the interaction between network effects and external incentives on file sharing behavior in 

Peer-to-Peer (P2P) networks. Many current or envisioned P2P networks reward individuals for sharing 

OS)' files, via financial incentives or social recognition. Peers weigh this reward against the cost of sharing 

incurred when others download the shared file. As a result, if other nearby nodes share files as well, the 

•^U cost to an individual node decreases. Such positive network sharing effects can be expected to increase 

the rate of peers who share files. 

In this paper, we formulate a natural model for the network effects of sharing behavior, which we term 
the "demand model." We prove that the model has desirable diminishing returns properties, meaning 
that the network benefit of increasing payments decreases when the payments are already high. This 
result holds quite generally, for submodular objective functions on the part of the network operator. 
^ ' In fact, we show a stronger result: the demand model leads to a "coverage process," meaning that 

O . there is a distribution over graphs such that reachability under this distribution exactly captures the 

joint distribution of nodes which end up sharing. The existence of such distributions has advantages in 
simulating and estimating the performance of the system. We establish this result via a general theorem 
characterizing which types of models lead to coverage processes, and also show that all coverage processes 
Q\ ' possess the desirable submodular properties. We complement our theoretical results with experiments 

l/~) , on several real-world P2P topologies. We compare our model quantitatively against more naive models 

l/~) ■ ignoring network effects. A main outcome of the experiments is that a good incentive scheme should 

l/") ' make the reward dependent on a node's degree in the network. 

1 Introduction 

Peer-to-Peer (P2P) file sharing systems have become an important platform for the dissemination of files, 
music, and other content. The basic idea is very simple: individuals make files available for download from 
their own machine. Other users can search for files they desire and download them from a peer who has 
made the file available. Naturally, designing systems such that the search and download of files are efficient 
poses many research challenges, which have received a lot of attention in the literature [2"ll22j. 

A second, and somewhat orthogonal, issue is how to ensure sufficient participation and sharing of files. 
Unless enough content is provided by individuals, the utility of membership will be very small. If free- 
riding [9] is too prevalent, the system may exhibit a quick decrease in membership common to public- goods 
type economic settings (25] . 

Thus, the P2P system must be designed with incentives in mind to encourage file sharing. These incen- 
tives can take the form of monetary payments or redeemable "points" |llj . download privileges, or simply 
recognition. From the system designer's perspective, these payments should be "small," while ensuring 
enough participation. 



'Department of Computer Science, University of Southern California, CA 90089-0781, USA. E-mail: salek@usc.edu 
^Microsoft Corporation, One Microsoft Way, Redmond, WA, 98052-6399, USA. E-mail: shahins@microsoft.com. Work done 

while the author was at the University of Southern California. 

* Department of Computer Science, University of Southern California, CA 90089-0781, USA. E-mail: dkempc@usc.edu. 

Work supported in part by NSF CAREER Award 0545855 and an ONR Young Investigator Award. 



On the other hand, from a peer's perspective, the payments need to be weighed against the cost incurred 
by sharing a file. In this paper, we assume that the content is shared legally and the system is designed 
with security in mind: hence, the main cost to an individual is the upload bandwidth which will be used 
whenever another peer downloads a file from this node. 

Nodes will in general choose to download from nearby peers (in terms of bandwidth or latency) . Therefore, 
as additional nearby peers share the same files, the load will get distributed among more nodes, and the cost 
to each individual node will decrease. Thus, not only will we expect cascading effects of sharing based on 
social dynamics [T2], but we would also expect these cascading effects to be based on a network structure 
determined by point-to-point latencies and bandwidths. 

Our contribution in this paper is the definition and analysis (both theoretical and experimental) of a 
natural model for peers' sharing behavior in P2P systems, in the presence of network effects and economic 
incentives. In our model, we focus only on sharing one file; in practice, the model can be applied separately 
for each file of interest. The basic premise of the model is that each node has a certain demand for the file. 
Furthermore, the network determines which percentage of the demand will be met by downloading from each 
peer sharing the filq^l. The crucial implication of this model is that the more nearby peers are sharing a file, 
the more evenly the demand will be distributed among them. 

The upload bandwidth cost is compensated by a payment to the peers who make the file available. 
Again, our model is agnostic about whether these payments are monetary, recognition, or take other forms. 
In our model, the payments can be explicitly based on the network degree of peers, since high-degree nodes 
presumably serve a key role in propagating sharing behavior. 

We argue that this model captures the essential dynamics of P2P systems in which a peer can join the 
network and download files without sharing; hence, availability of files is not the only incentive for sharing. 
The FastTrack P2P protocol, used by KaZaA, Grokster, and iMesh, is an example where this assumption 
holds; hence, our model should be a reasonable approximation for these services in terms of its incentives. 

The network operator is interested in maximizing a social welfare function W, which grows monotonically 
as a function of the set of nodes that share the file. This function could be the total number of sharing 
nodes, the number of nodes with at least one uploading neighbor, or the total download bandwidth available 
to peers under various natural models of downloading. 

After defining this model formally (in Section [2]), we prove strong and general diminishing returns prop- 
erties about it (in Section [3|). In particular, we show that whenever W is monotone and submodular, the 
network's social welfare as a function of the payments offered to the peers is monotone i.e., increasing pay- 
ments will always increase social welfare. However the rate of increase decreases when payments are already 
high. We call the latter property diminishing returns. 

To prove this result, we consider a slightly different model, wherein payments are combined with giving 
the network operator the ability to "force" some set S of peers to share. By first proving certain local 
submodularity properties for this modified model, the desired diminishing returns properties are implied by 
the general result of Mossel and Roch [18]. However, we derive a similar result to [18j for a broad subclass of 
submodular functions which we call coverage functions. It consists of the functions for which in the underlying 
process, the distribution of nodes sharing the file is equivalent to the distribution of nodes reachable from 
S in an appropriately defined random graph model. We establish this equivalence via a general and non- 
trivial theorem characterizing all functions that can be obtained by counting reachable nodes under random 
graph models. As a corollary, our approach provides a much simpler proof of the main result from |18j for 
coverage processes. Moreover, the fact that the propagation of sharing behavior is a coverage process is 
useful for the purpose of simulating the process and estimating the parameters of the system, allowing more 
efficient algorithms for simulations. Finally, our characterization can be of independent interest in the study 
of submodular set functions. 

While the bulk of our paper focuses on a theoretical analysis of the demand model, we complement the 
theoretical results by an experimental evaluation of our model (in Section 0]), using two network topologies 
derived from real- world data sets [T3H201I1I]: an d a regular two-dimensional grid topology. We first show 
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that network effects are significant by comparing our demand model with one in which peers are not aware 
of changes in load due to nearby sharing peers. We then evaluate different payment schemes, in particular 
regarding their dependence on nodes' degrees. We evaluate these both in terms of the fraction of peers that 
end up sharing, and the amount paid by the network operator per sharing node. 

1.1 Related Work 

There is a large body of work on incentive mechanisms in P2P file-sharing systems. (Sec [8] for a thorough 
overview and |27| for a recent generalized analysis framework.) Incentive mechanisms can be classified in 
three categories: barter-based mechanisms, reputation-based mechanisms, and currency-based mechanisms. 

Barter-based methods pQ enforce repeated transactions among peers by matching each peer to only a 
small subset of the network, hence raising the survival chance for strategies based on reciprocation. This 
method only works when we have a small and popular set of files. For instance, the Bit Torrent protocol [5] 
is a popular P2P file-sharing protocol using this method. 

Reputation-based mechanisms have an excellent track record at facilitating cooperation in very diverse 
settings, from evolutionary biology to marketplaces like eBay. These systems keep a tally of the contribution 
of each peer; the past contributions determine which peers obtain more of the system's resources in the future. 
However, the availability of cheap pseudonyms in P2P systems makes reputation systems vulnerable to Sybil 
and whitewashing attacks [5] , leading to ongoing work on designing sybilproof reputation mechanisms [5] . 
Moreover, reputation systems may be vulnerable to coordinated gaming strategies due to distributed rating 
systems [24] . 

Inspired by markets, a P2P system can also deploy a currency scheme to facilitate resource contributions 
by rational peers. Generally, peers earn currency by contributing resources to the system, and spend the 
currency to obtain resources from the system. Karma [3S] is one example of this kind. Currency- based 
systems may also suffer from Sybil and whitewashing attacks, depending on their policies toward newcomers. 
If newcomers are endowed with a positive balance, then the system is vulnerable to these attacks; otherwise, 
there might not be enough incentive for newcomers to join the network. Balance control could also be 
troublesome, as the system might need to deal with negative balances. 

Lai et al. |16j introduced the concept of "private" history vs. "shared" history as a way to combine barter- 
based and reputation-based mechanisms in the context of an evolutionary prisoner's dilemma. Shared history 
is a pool that records peers' past behavior and services them according to their reputation. In [9], file sharing 
is modeled as a social phenomenon, akin to those discussed by Schelling [23]. Users consider whether or 
not to contribute files based on the number of other users who contribute. Our model is different in that it 
explicitly models the costs incurred by contributing nodes, rather than simply positing an intrinsic generosity 
parameter for each user. 

2 Models and Preliminaries 

We consider a peer-to-peer network with n servers (or nodes or peers) , and focus on the behavior of sharing 
one particular file. Thus, each peer v may either choose to share the file or to not share it. We also call 
sharing peers active, and the other ones inactive. The set of all peers who share is denoted by V + . 

2.1 The Demand Model 

Each peer has a local demand d v for the file: this demand will originate from individual users on the server v 
(who themselves might not possess the file or be in a position to make it available). The demand d v should 
be served by downloading the file from other servers u G V + . The quality of the connection between v and u 
is captured by a matrix P: the larger p„ ]U , the larger a fraction of w's demand will be served by u (assuming 

that u shares the file). Specifically, the demand that u G V + will sec from v is d v ■ ^ — ^^ . The matrix P 

will in practice depend on network latencies or bandwidth, as well as explicit download agreements. It need 
not be symmetric. For the purpose of the general model, we are agnostic to the derivation of P; in Section 



31 we will derive P from measured network latencies by positing a latency threshold which individuals are 
willing to tolerate. 

A node u G V + sharing the file will incur a cost of c u per unit of demand that it serves; this cost is the 
result of using upload bandwidth, machine processing time, or similar resources. To encourage peers to share 
the file despite this cost, the P2P network administrator offers payments tt u to the nodes u G V + . These 
payments need not be the same for all nodes, and can be derived from the network structure, e.g., a node's 
degree. 

Different nodes may have different (and unknown) tradeoffs between money and upload bandwidth. We 
model this fact by assuming that each node u has a tradeoff factor X u , drawn independently and uniformly 
at random from [0, 1], which captures how many units of bandwidth one unit of money is worth to the node. 
Thus, the sharing utility of an active node u G V + is 

U(U) = X U TT U - C u X 



while the sharing utility of non-sharing nodes is 0. (A non-sharing node does not get paid and incurs no 
upload costs.) We assume that agents are rational, and thus choose whether to share or not to share so as 
to maximize their own utility. 

2.2 Other Models 

As we discussed in Section [TJ one of our main contributions is the observation that file sharing behavior 
should be subject to positive network externalities, i.e., that the presence of other sharing peers makes 
sharing less costly. To quantify the size of such network effects, we define two alternative models with no or 
limited effects; we will compare these two models experimentally with the demand model in Section 2] 

f . In the No-Network Model, the peers completely ignore other sharing peers. Thus, a node u assumes 
that if it shares the file, then it will see a fraction p v _ u of the demand originating with node u. Hence, 
the perceived utility of node u when sharing is 

U(u) = \ u ir u - c u -22 dvPv,u- 

V 

2. In the One-Hop Model, the peers are aware of network effects in a very limited way: node u assumes 
that any node v sharing the file will contribute toward serving both v's and it's demand, but not toward 
serving the demand of any other node w ^ u,v. Thus, the perceived utility of node u G V + is in the 
One-Hop Model is 

U(U) = \ u TT u -C u -=; ■ C u ■ 2_^ ^ ■ 

2.3 Payment Schemes, Sharing Process, and Administrator's Objective 

The network administrator's choice is how to set the payment offers ir u . In doing so, the administrator 
balances two competing goals: low overall payments and high utility for the participants in the system. In 
this paper, we study the impact of payment schemes on these objectives. 

In order to provide enough incentives for sharing, the network administrator should always ensure that 
k u > C u := c u ■ ^2 V d v . Otherwise, even a node u with A M = I (i.e., the highest possible utility for money) 
would have no incentive to share the file if no other peers are sharing the file. 

The full model is thus as follows: after the administrator decides on the payments n u for all nodes 
u, the random tradeoffs A u between money and bandwidth are determined independently for all nodes u. 
Subsequently, the process proceeds in iterations. In each iteration, all peers simultaneously decide whether 
to share the file or not, based on the payments, costs, and previous decisions of all other peers. The process 



continues until an equilibrium is reached. Notice that because the cost to a peer is monotone decreasing 
in the set V + of currently sharing peers, the set of sharing peers can only become larger from iteration to 
iteration. In particular, this implies that the process will eventually terminate with some set V + of active 
peers. We call this the sharing process or activation process. 

The network administrator is in general interested in increasing access to the file while keeping the 
payments low. This general objective may be captured using various metrics. In general, we allow for any 
overall social welfare function W which increases monotonically in the set S of sharing nodes. Notice that 
since the set S itself is the result of a random process, the administrator's goal will be to maximize E [W(S)], 
where S is derived from the random activation process in the demand model. Several social welfare functions 
W suggest themselves naturally: 

1. The number of active peers is a natural measure of participation. It is the measure frequently studied 
in the context of the diffusion of innovations or behaviors in social networks (10.12. 14"1 |15II17|[1"B] . While 
the objective is similar, the precise dynamics arc different between those models and the demand model. 

2. The total number of serviced nodes, i.e., nodes v with at least one active node u with p UjV > 0. This 
model is appropriate if we only care about how many peers can download the file, but not about 
the quality of the connection. It implicitly assumes that each peer has a constant utility of 1 for 
downloading. 

3. Each node u gets a utility of ^2 veV + Pu,v, and the social welfare is the sum of all these utilities. This 
model is based on the assumption that u's demand is served by all of its neighbors (including possibly 
u) simultaneously, and that u's utility is the total "download bandwidth" available in this sense. We 
call this the sum-welfare function. 

4. Each node u gets a utility of max 1 , e y+ p u ,v, and the social welfare is the sum of all these utilities. This 
is based on the assumption that m's demand is served by its active neighbor with the best connection, 
corresponding to a situation where parallel download from multiple sources is not possible. We call 
this the max-welfare function. 

Notice that the social welfare function W may also include the utilities of the sharing nodes. 

3 Theoretical Analysis of the Model 

The main analytical contribution of this paper is based on coverage processed, defined formally in Definition 
[5j Informally, a coverage process is a random process such that the distribution over sets of ultimately active 
nodes is also the distribution of reachable nodes under a suitably chosen distribution of random graphs. Our 
results on coverage processes are twofold: 

(1) We give a general characterization of coverage processes, and show that the activation process for P2P 
systems is a coverage process. (2) We give a significantly simplified proof (compared to the general result 
of [TB]) showing that under coverage processes, the expected social welfare as a function of the payments has 
diminishing returns in the sense of Definition [1] so long as the social welfare is a submodular function of the 
active nodes. 

Recall that a function / defined on sets is submodular if f(S + v) — f(S) > f(T + v) — f(T) whenever 
S C T, i.e., if the addition of an element to a larger set causes a smaller increase in the function value than 
to a smaller set. Thus, submodularity is the discrete analogue of concavity, and intuitively corresponds to 
"diminishing returns." An easy inductive proof (on the size of X) shows that submodularity is equivalent 
to the condition that for all sets X, 

f(SUX)-f(S) > (TUX)-f(T) whenever S C T. (1) 



2 We thank Bobby Klcinbcrg for this naming suggestion, and also note here that Theorem [51 was derived independently by 
him. 



Definition 1 A function g : K n — > K has diminishing returns if for every pair i,j and all vectors x, it 
satifises 

dg(x 1 ,x 2 ,..-,x n ) < 
dxidxj 

Remark 2 The notion of "diminishing returns" is strictly weaker than concavity; it corresponds to concavity 
only along positive coordinates axesO 

The two main contributions of our paper together imply the following theorem as a corollary: 

Theorem 3 Let W(~k\, . . . ,7r n ) = E [VF(S')] be the expected social welfare when set S is obtained from the 
sharing process of the demand model with payments ni, . . . ,ir n . 

IfW(S) is submodular, then W(ni, . . . , 7r n ) is monotone and has diminishing returns with respect to the 
payments m, . . . ,% n . 

For the social welfare function, the diminishing returns property intuitively means that the additional 
benefit in social welfare that can be derived from increasing the payment to a peer u decreases as the peers' 
current payments increase. 

The proof of Theorem [3] is based on analyzing the following Seed Set Model, which we define mainly for 
the purpose of analysis. 

Definition 4 (Seed Set Model) For each node, the payment offered is ix u ~ C u . Besides payments, we 
have a seed set S of peers that will always share regardless of the payments. Subsequently, the process unfolds 
exactly according to the sharing process. 

The main technical step is to show that the Seed Set Model is a coverage process, in the following sense. 

Definition 5 (Coverage Process) Let 4>(S) be the random variable describing the set of nodes active at 
the end of a process starting from the set S of nodes active. The process is called a coverage process if there 
exists a distribution D over graphs G such that for each set T of nodes, Piob[(/)(S) = T] equals the probability 
that exactly T is reachable starting from S in G if G is drawn from the distribution D. 

Remark 6 Without using our nomenclature, [14] showed submodularity for the Cascade and Threshold 
models of innovation diffusion |10|fl~2] by establishing that both gave rise to coverage processes. Subsequently, 
(T5] showed that there are natural diffusion processes which are not coverage processes, yet have a submodular 
function E[|<£(S)|]. 

We prove that the Seed Set Model is a coverage process in two steps. First, in Section FSTTT we give a 
general and complete characterization of Coverage Processes. This characterization may be of interest in its 
own right, as coverage processes have a practical advantage: they can be simulated easily and efficiently, by 
first generating a random graph according to D, and then simply finding the set of reachable nodes. 

Then, in Section 13.21 we show that the Seed Set Process satisfies the conditions established in Section 
13.11 Finally, in Section 13.31 we give a simple proof that for any coverage process and any submodular social 
welfare function, the expected social welfare under the process is also submodular. This implies diminishing 
returns with respect to the payments. 

Remark 7 The fact that the tradeoffs X u between money and bandwidth are uniformly random in [0, 1] is 
important to ensure the submodularity and diminishing returns properties. If the X u are not random but 
fixed, then the diminishing returns and submodularity properties cease to hold. Furthermore, in the Seed 
Set Model, the optimization problem of finding the best seed set S of at most k nodes becomes very hard, 
as we show in the appendix. 
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3.1 Characterization of Coverage Processes 

In this section, we characterize exactly which random processes are coverage processes. This theorem may 
be of interest in its own right, when analyzing different processes. 

Our setting is exactly as in the paper by Mossel and Roch |18| : each node u has an activation function 
/", which is monotone non-decreasing and satisfies /"(0) = 0. Each node independently chooses a threshold 
9 U G [0, 1] uniformly at random, and becomes active when f u (S) > 9 U , where S is the previously active set 
of nodes. This process is repeated until no more changes occur. 

In order to express our results concisely, we use the following discrete equivalent of a derivative (see, 
e.g., [26]). For a function / defined on sets, we define inductively: 

h(S) = f(S) 
fRU{v}(S) = f R (Su{v})-f R (S). 

It is not difficult to verify that this notion is well-defined, i.e., independent of which element v is chosen at 
which stage. 

Theorem 8 The following conditions are necessary and sufficient for the process to be a coverage process. 

• For all sets T of odd cardinality \T\, as well as for T = 0, and each node u, we have f^(T) > 0. 

• For all sets T of positive even cardinality \T\, and each node u, we have f^{T) < 0. 

• /"(0) = for all u. 

To prove this theorem, we begin with the following reasoning. Focus on one node u, and its activation 
function /". If there were an equivalent graph distribution D, then it would have to define a probability 
q u (T) for the presence of edges from exactly the vertex set T to u. These probabilities need to satisfy the 
following property: if a set S of nodes is active, then the probability of u having at least one incoming edge 
from S must equal f u (S). Thus, a necessary and sufficient condition for being a coverage function is that 
for each node u, there exists a distribution q u {T) over sets T such that 

f u (S) = ]T q u (T). (2) 

T:TnS#0 

We can express this requirement more compactly using matrix notation. Let f u be the (2™ — l)-dimensional 
vector consisting of all entries of f u (S) for S ^ 0. Similarly, let q„ be the (2™ — I)-dimcnsional vector of 
all q u {S) for S 7^ 0. Let A be the ((2™ — 1) X (2™ — l))-dimensional matrix indexed by non-empty subsets 
such that As.t = 1 if and only if S D T ^ 0, and As,t = otherwise. (A is called an incidence matrix [3].) 
Then, Equation [2] can be rewritten as the requirement that for each node u, there exists a distribution q„ 
such that A ■ q u = f u . 

For the analysis, we fix a canonical ordering of subsets. Specifically, if the current (sub-)universe consists 
of k nodes indexed {I, 2, . . . , fc}, their canonical ordering is defined recursively as first containing all subsets 
of {1,2, ...,k — 1} in canonical order, then the set {k}, followed by the sets T U {k}, where the sets 
T C {1, 2, . . . , k — 1} appear in canonical order. 

In order to find out when the distribution q u exists, we want to solve the equation A ■ q u = f u , or 
f u = A^ 1 ■ q„. While the inverses of some incidence matrices have been studied before (see, e.g., [3]), we are 
not aware of any source explicitly giving the inverse of the matrix A. Hence, we establish here: 

Lemma 9 The inverse of A is the matrix B defined by 

f o i/sur^{i,..,n} 

° S ' T - 1 (-l)|SnT|+i otherwise 



Proof. The key insight is that under the canonical ordering of sets defined above, the matrices A and B 
can be defined recursively via matrices Ak and Bk- Specifically, let A\ — \, and 
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The fact that A = A n and B = B n can be observed directly from the definition and the canonical ordering. 
To prove the lemma, we can show by induction on k that A k ■ B k = Ik for all k, where Ik is the k x k 
identity matrix. The base case k = 1 is obvious. For the inductive step to k + 1, consider the (i,j) entry 
(Ak+i ■ Bk+i)ij- We distinguish 7 different cases, based on the (i,j) indices. (We use to denote the 
(2 k — 1 ) • (2 k — f ) matrix of all zeroes, 1 for the vector of all ones, and u for the (2 k — 1 )-dimensional unit 
vector with 1 in its last coordinate and everywhere else.) 

1. If i, j < 2 k , then the entry is (Ak ■ 0)ij + + (Ak ■ Bk)i,j = (Ik)i,j by induction hypothesis. 

2. If i > 2 k ,j < 2 k , then (writing i' = i - 2 k ), the entry is (A k ■ O^^ - u.;/ + (1 • B k )i' = using Lemma 



i'J'i 



[TOta) below. 

3. If i < 2 k J > 2 fe , then (writing f =j- 2 fe ), the entry is (A k ■ B k ) itf + itf - (A k ■ B k ) itj > = 

4. If i,j > 2 fe , then (writing i'=i- 2 k ,f = j - 2 k ), the entry is (A k ■ B k ) v ,,-» + Uj> - (1 • B k ) r = (h 
again using Lemma [TUta). 

5. If i = j = 2 k , a straightforward calculation shows that the entry is I. 

6. If i — 2 fc , j < 2 k , then the entry is (1 ■ B k )j — Uj = by Lemma fTW a) . Similarly, for i = 2 k ,j > 2 k , 
writing j' = j — 2 k , the entry is \ij> — (1 • Bk)j> = by Lemma [TUT a). 

7. Finally, for j = 2 k , i < 2 k , the entry is — (A k ■ u T )i + (A k ■ u T )i = 0, whereas for j — 2 k , i > 2 k , writing 
i' = i — 2 k , the entry is ~(A k ■ u T ) 4 / + 1 = by Lemma [JTJT b). 

This proves that A k +i ■ B k +\ = h+i- ■ 

Lemma 10 Let 1 be the vector of all 1 's, and u defined as in the proof of Lemma\Q Then, (a) 1 • B k = u, 
and (b) Ak ■ u T = 1 . 



Proof. For part (a), we show that the row sums of all rows of Bk are zero except the last row, which has 
a row sum of one. The proof is by induction. The base case B\ — 1 is clear. For the inductive step from k 
to k + 1, first notice that all the entries in columns j < 2 k — 1 are zero by induction hypothesis. For column 
2 k — 1, the row sum of Bk contributes 1 by induction hypothesis, from which 1 is subtracted because of the 
entry in the middle column. Column 2 k adds up to explicitly, and columns j = 2 k + 1, . . . , 2 k+1 — 2 have 
terms of Bk and — Bk canceling out. Finally, for the last column, the entries of Bk and — Bk cancel out, 
leaving the entry 1 from the middle column. 

For part (b), simply notice that using part (a) and the induction hypothesis of Lemma [3] (for k), we get 
that Ak ■ u T = Ak ■ B^ ■ 1 = /& • 1 = 1. Here, we used that Bk is symmetric. ■ 

The next lemma shows that so long as all q u {S) are non- negative, by setting q u {%) appropriately, we can 
always obtain a probability distribution. 

Lemma 11 With q u (S) defined as q M = B ■ f u , we have ^2 S q u (S) < 1. 

Proof. Let 1 denote the all-ones vector as before. We can rewrite 

Y,1u(S) = l-(B-i u ) = (l-B)-f u . 
s 

Using Lemma llOf a). the sum is exactly equal to /"({l, . . . , n}) < 1, completing the proof. ■ 

By Lemma [SI we know that q u — B ■ f u . And by Lemma 111 [ the entries sum up to at most 1. Thus, it 
remains to show that the entries of q u are non-negative if and only if /" satisfies the conditions of Theorem 
[51 To relate these formulations, we prove the following non-recursive characterization of discrete derivatives. 

Lemma 12 For all sets T , we have that 

MW) = ]T(-i)i T H s i/(^us). 

SCT 

Proof. The proof is by induction on \T\. For T = 0, the claim is trivial. Now, consider a set Tk+\ = TkL){t} 
of size k + 1. By definition of the discrete derivative and induction hypothesis, 

h h+1 (W) = f Tk (WU{t})-f Tk (W) 

= E (-l) k ~ lSl f(W USD {*}) - E (-l) fc - |S| /(^ U S) 

SCT fc SCT k 

J2 (-lf +1 " |S| /^U5)+ E {-l) k+1 ~ lSl f(WUS) 

S<ZT k+1 :t£S SCT k + 1 :t4S 

= E (-i) fe+MS| JWus), 

SQT k+ i 

which completes the inductive proof. ■ 

Proof of Theorem [8j Fix any node u, and define q u = B ■ i u . By Lemma IT21 we can write the discrete 
derivative of f u at T as 



fr(T) = ^(-l) m - lsl r(TUS). 



SCT 



Now, if \T\ is odd, then (— 1)' T ' s ' = (— l)' s ' +1 , so we can rewrite the above as 

]T(-l) |S|+1 f(TUS) = E (-l) lWnTl+1 f U (W) = q u (T). 
SCT WD t 



Similarly, if \T\ is even, then (— 1)' T s = (— l)' s ', so we can rewrite the discrete derivative as 
£(-l)|S|/«(TUS) = J2 (-l) lWnTl f U (W) = -q u (T). 

SCT WDT 

Thus, the q u {T) are all non- negative (and the probability distribution thus well-defined) if and only if 
fr(T) > for \T\ odd, and ffi(T) < for \T\ > even. ■ 

3.2 Coverage Property of the Seed Set Process 

In this section, we establish the following theorem. 

Theorem 13 The Seed Set Process is a coverage process. 

Proof. In order to prove this theorem, we want to apply Theorem [5] To do so, we need to show that the 
local decisions of nodes about sharing can be cast in terms of submodular threshold functions. Specifically, 
we define 



f u (S) ■= l-^r-Cu-J2 



^vPv,u 



and let 8 U = 1 — A " u " . (Recall from Section HOI that C u = c u ■ ^ v d v .) 

A node u becomes active if doing so has positive utility, i.e., if \ u ir u > c u ■ ^2 V y* " Pl1 '" . Dividing 

both sides by C u , and subtracting from 1 shows that this is equivalent to saying that 

, A U 7T U 1 ^— \ dvPv,u 

Since \ u tt u is uniformly random in [0, C u ] by the definition of tt u in the Seed Set Model, this condition is 
equivalent to saying that 9 U < f u (S). Thus, we have shown that the activation process can be equivalently 
recast in terms of threshold activations functions. 

Finally, we need to show that for every node u, all derivatives /y(S') are non-negative when \T\ is odd 
and non-positive when \T\ > is even. (The fact that f u (S) = f$(S) is non-negative follows directly by 
definition.) Let 



f u (xi, ...,x n ) = 1 - — • c u ■ ^2 



Q"uPv,u 



be the continuous equivalent of the local influence function /". For a set S, let y( s ) denote the n-dimensional 
vector with yf^ = 1 if Vi € S U {u} and yi S) = otherwise. Then, f u (S) = f u (y {s) ). Notice that by 
definition, there is no division by zero. 

Writing dYr = dy^dy^ ■ ■ -dyi , where T = {ii, 12, . . . , «|t|}j an easy inductive proof first shows that 

^ = i:--i: d -v^ 

It remains to show that each term inside the integration is non- negative for odd \T\ and non-positive for 
even \T\. We accomplish this by showing that 



df u (y is) ) f_nlT|+i| r |jfu y d vPv ^l\ teT p v 



dY T C lL t-* 1 cv n ?/ 5), )l T l+ 1 
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The proof is by induction. The base case: \T\ = 1 can be verified easily. Assume that the claim holds 
for \T\ =i-l. We have 

df u (y {s) ) d .(_l)l T l+ 1 lT|!- Cu ^ d vPv.uUteTPv,t 



dY T dy t dy t ' C u *-f (XL e yPi>^^) |T|+1 



= (-i)(-ir^\T\\ 



T\ + l\rr\i c « V^ 



(\T\ + l)Pv,v t d v p VtU Y\ t ^ T p v . t {Y, Vl£ v Pv^iVi)^ 



0-vPv,u llt£TU{vi}P v it 



= (-l) |T|+2 |T+l|l— V 

This completes the inductive proof, and thus the proof of Theorem 1131 ■ 

While we defined the Seed Set Process primarily as a tool for analysis, we remark here that Theorem [13] 
has a direct consequence for the optimization problem of maximizing the expected total number of active 
nodes at the end of the process, subject to a size constraint on the seed set S. A Theorem of Nemhauser et 
al. [71I19J states that if / is any non-negative, monotone, and submodular function on sets, then the greedy 
algorithm is a polynomial-time (1 — l/e)-approximation (where e is the base of the natural logarithm). Since 
we can approximate the expected number of active nodes under the Seed Set Process arbitrarily closely by 
simulating the activation process (see [14] for an in-depth discussion of the greedy algorithm), we obtain the 
following corollary: 

Corollary 14 The best starting set S for the Seed Set Process can be approximated within (1 — 1/e — e) in 
polynomial time, for any e > 0. 

3.3 Diminishing Returns of Expected Social Welfare 

Finally, we use the machinery of coverage processes to show diminishing returns of social welfare. Consider 
an arbitrary coverage process. When the coverage process starts with the set T, let <f>(T) be a random 
variable describing the set of nodes active at the end of the process. Thus, the distribution of <f)(T) for all T 
precisely characterizes the coverage process. Our main theorem is now the following: 

Theorem 15 Let h(S) be any monotone submodular function of S . Then, E [h((f)(T))] is a monotone sub- 
modular function of T , where the expectation is taken over the randomness in <fi(T) . 

This theorem follows from the general result of [18], since all coverage processes are locally submodular, and 
our utility function is submodular with respect to the set of sharing neighbors. However, below we give a 
very simple proof based on reachability in graphs using the fact that <f> is a coverage process. This is useful 
for the purpose of simulating the process and estimating </>. It means that instead of generating random 
thresholds and simulating a dynamic process, we can generate a random graph and then simply use BFS to 
find the number of reachable nodes. 

Proof. Because is a coverage process, by Theorem H] there is a distribution Pr[-] over graphs H such 
that for any set T, the set of nodes reachable in H from T has the same distribution as <j>(T). Let 4>h(T) 
denote the set of nodes reachable from T in H . Then, 

E[h(c/>(T))] = J^Pr[H]-h(cf> H (T))- 

H 

Fix some graph H and let S C T and x £ T. Then, 

h(4> H (T + x))-h(4> H (T)) = h(4> H (T) U <f> H ({x})) - h(cf> H {T)) 

< h(cf >H (S)U(b H ({x}))^h(cj )H (S)) 
= h((l>H{S + x))-h(4>H{S)), 

11 



where the inequality followed from Inequality (TTJ), and the equalities from the definitions of reachability in a 
graph. Thus, for any fixed graph H, the function h(cf>H(T)) is monotone and submodular in T. Because the 
Pr[i7] arc probabilities, E [h{<j}{T))\ is a non-ncgativc linear combination of monotone submodular functions, 
and thus also monotone and submodular. ■ 

The final piece of the proof of Theorem [3] is the following lemma, showing that monotonicity and sub- 
modularity of the Seed Set Model imply diminishing returns for the original model. 

Lemma 16 Let f be a non-negative, monotone, submodular function on sets. Consider the function g 
defined as follows: Each element u is included in S independently with probability q u (n u ), where q u is an 
increasing and concave function of tt u . Define g{^) = E[/(S)]. Then, g is monotone and satisfies the 
diminishing returns property as defined in Definition [7J 

Proof. First, notice that g(w) = J^s'cv f( S ') Hues' 1u(n u ) Y\ u ^ s ,(l ~ q u (^u))- In order to show the 
diminishing returns property, it is enough to show that v^' > and » £ qI. < for all i,j G V. Using the 
definition of g, we have: 

-q=t = 1^ f^'-j^-- 11 ««M-ll(i-««(^)) 

1 scv,ies ' ues,u^i u<$s 

e w^ 1 • n *.m • n a-&(*.)) 

SCV,ifS % ueS u£S,u^i 

£ (f(S)-f(S-i))-^-- n QuM-Uil-quM) 

SQV,i£S % ueS,u& u<£S 

> o. 

The last inequality holds because q, J^ > and / is monotone. 

Next we need to show that „ 9 ^ < for all i, j £ V. For i = j, a calculation similar to the one above 

07Ti07Tj — ' J J ? 

shows that 

* S(lV,i£S l u£S,u^i u<£S 

which is non-positive because / is monotone and qi is concave. 

Finally, suppose that i ^ j. Using a calculation similar to the one above, we can rewrite d ^ $1 . as 

E (f(S + i + jl-f(S + i)-f(S + j) + f(S)).^^-^^-.]Jq u (n u )- J] (1 -«„(*„)), 

SCV\{i,j] l 3 u£S u£S,u^i,] 

which is non-positive because / is submodular and qi,qj arc concave. ■ 

With Theorem [13] and Lemma I16[ we can now complete the proof of Theorem [3] 
Proof of Theorem [3] Consider one node u. The probability that it becomes active initially is 

p° u = Prob[A„7r„ >C U ] = 1 - — . 

Recall that C u = c u ■ ^2 d v , and n u > C u in our model, so this number is always non- negative. 

Clearly, p° is also a monotone increasing function of ~k u . To verify concavity, we simply take two deriva- 
tives: the second derivative is 7 ,g , and thus non-positive, so p® u is concave. 

12 



Now, consider all the nodes u which did not initially become active. This is equivalent to saying that 
A u 7r„ < C u . But subject to this bound, X u tt u is uniformly random, so we are in the situation of having an 
initially active set S, and for each remaining node u, the payment is independently and uniformly random in 
[0, C u }. By Theorems IT51 and [T5T the expected social welfare W(S) is a monotone and submodular function 
of the seed set S, so long as W is submodular in the set of active nodes. We can therefore apply Lemma [TBI 
to E [h(cf>(T))], which implies that W(tti, . . . , 7r„) has the diminishing returns property. ■ 

Each of the social welfare functions listed in Section [2] can be shown to be monotone and submodular 
in the set of active nodes by simple calculations. Thus, for all of these objective functions, the total social 
welfare is a monotone function of the payments with diminishing returns properties. 

4 Experimental evaluation 

In this section, we summarize our observations based on simulations both on synthetic and real-world P2P 
networks. 

We have developed a simulator for the three models described in Section [2] 

4.1 Simulation model 

Given a payment scheme 7r, we generate random X u and compute the number of active (sharing) nodes. We 
also compute the value of the social welfare according to the utility functions in Section [2] 

In addition, we calculate the total payments, and the average payment per active and per serviced node. 
These numbers are averaged over 1000 iterations, each with different random A. 

Network topology. For our evaluation, we consider different network topologies, including two network 
topologies derived from real- world data sets [T31[5D1[3T] , and a regular two-dimensional grid topology. The 
real-world data sets are based on measured end-to-end latencies between pairs of servers deployed in the 
Internet [13]. The MIT King data set j^TJ is symmetric and measures RTT between each pair among 
1740 servers, while the Harvard King data set |20j provides asymmetric median latencies between each pair 
among 1895 servers. In addition to networks derived from these two data sets, we also consider a regular 
two-dimensional grid. 

We derive the download percentage matrix P from the latencies by setting p V}U = max(0, 1 p^), 

where A„_> t , is the latency from u to v, and T is a hard threshold for tolerable latencies. This models the fact 
that users prefer to download from peers to which they have fast connections, and have a threshold beyond 
which latency may not be tolerable any more. By varying T, we can obtain denser or sparser download 
network topologies. We will refer to the networks derived from the MIT King data set as MIT networks, 
and those derived from the Harvard King data set as Harvard networks. 

In addition to networks derived from these two data sets, we also consider a regular two-dimensional 
grid. We do not report all results for all topologies here. Unless stated otherwise, our observed trends apply 
to all of these topologies. 

Payment schemes and non-sharing peers. In our experiments, we consider different payment schemes 
7T, to study the impact of payments on the propagation of sharing behavior. We parameterize the schemes 
with two parameters a, j3, and set ir u = a ■ d,P, where d u is the degree of node u in the network defined by 
the p u>v values. Thus, the financial utilities are chosen uniformly at random from the interval [0, a ■ d£]. 

We also consider the impact of peers who cannot (or do not want to) share the file at all, regardless of the 
payment offered. Such peers may still be interested in downloading the file. Their presence can be expected 
to decrease the sharing behavior in networks, as they will place load on other peers without contributing. 
We call such nodes "Empty" nodes, and consider the impact of different percentages of Empty nodes on the 
overall sharing percentage. 
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(a) Percentage active nodes (b) Active nodes per unit of payment 

Figure 1: Comparison of different models, using the Harvard network with no Empty nodes. 



4.2 Results 

Comparison of different models. We begin by estimating the size of network effects, by comparing the 
Demand model with the No-Network and One-Hop models. Figure [T] (a) compares the participation rates 
under the three models, with the same payment scheme and same network (Harvard). We keep j3 = 1 
constant in the payment scheme, and vary a. Thus, payments are proportional to nodes' degrees. The figure 
shows that by ignoring network effects, we would underestimate the number of sharing nodes by about 15% 
on average, and as much as 25% (for a = 1.2). The same trends hold for the fraction of serviced nodes (not 
shown here): the number of serviced nodes is underestimated by about 10% if ignoring network effects. 

Figure[T](b) compares the number of active nodes per unit of payment spent by the network administrator. 
This is an interesting metric as it captures the tradeoff between participation and payments. Compared to 
the number of active nodes, the choice of model seems to have remarkably little impact on the estimate of 
this quantity. For small values of a, the network effects lead to slightly higher payments per active nodes, 
as the network effects lead to an activation of more high-degree nodes, which have higher payments. This 
effect disappears as a increases, and more nodes are activated in the No-Network model as well. The same 
trends hold for the number of serviced nodes per unit of payment spent (not shown here). 

The results reported here stay essentially the same both for the MIT and grid topologies. In particular, 
the underestimate of the number of active nodes by the No-Network model is essentially the same in these 
topologies. In the grid topology, the No-Network model in fact overestimates the cost per active node by 
about 10%, as the dependence on the degree disappears, and network effects lead to an activation of more 
nodes with smaller payments. 

Different Social Welfare Functions We evaluate our theoretical results against the number of serviced 
nodes and the two social welfare functions sum- welfare and max- welfare, as defined in Section [2j All three 
are plotted in Figure [2] Although each social welfare function differs from the others in terms of the degree 
of submodularity (for example, sum- welfare can be shown to be completely modular in the number of active 
nodes), the curvatures of the plots as a function of payments are more or less the same. Thus, the concavity 
(diminishing returns) appears to be dominated by the submodularity of the activation process. 
Different payment schemes. For a network administrator, it is particularly interesting how the choice 
of payments will affect sharing behavior, and the cost-effectiveness of achieving a certain participation rate. 
Our next set of experiments therefore shows the percentage of active nodes, and the number of active nodes 
per unit of payment, when the parameters a and /3 in the payments n u = a ■ d? are varied. 

Figure [3] (a) shows the percentage of active nodes in Harvard network with 50% Empty nodes, as a 
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Figure 2: Sum- Welfare, Max- Welfare and the number of serviced nodes, Harvard network. 



function of a and (3. Figure [3] (b) shows the number of active nodes per unit of payment under the same 
setting. The cost effectiveness is maximized for very small values of a and j3, specifically f3 = and a = 1.2. 
However, this comes at a steep price, in that almost no nodes (only about 4.4% of the network) share in this 
case. 

Clearly, there is no single point at which the network should operate. Rather, a network administrator 
who wants to achieve a certain participation rate can use these plots find the most cost-effective payment 
scheme to achieve this rate. For instance, if the goal is to achieve 30% sharing, this can be achieved by 
setting a = 1.6 and /3 — 1.5, or a = 1.8 and /3 = 1. Of these, the first scheme spends about 30 units per 
active node, while the second scheme spends about 7 units per active node. Thus, a judicious choice of 
payments can lead to significant savings while ensuring the same level of participation. 

In general, the plot suggests that j3 £ [0.5, 1] tends to lead to good tradeoffs between participation and 
cost: for smaller values of /3, participation tends to be too low, while for higher values, the cost per active 
node increases significantly. 

The observed trends are fairly independent of the network topologies. In particular, the plots for both 
the grid and MIT network also suggest that f3 £ [0.5, 1] gives the best cost efficiency for a given fraction of 
participating nodes. 

Different thresholds (r). Finally, we investigate the impact of different latency tolerance thresholds T on 
the activation process. Recall that the larger T, the more peers u may serve v. For instance, with T = 2ms, 
the average degree of nodes in the Harvard network is 4.58, while with T = 5ms, the average degree increases 
to 14.93. In the resulting denser graph, we would expect less degree imbalance, and overall higher network 
effects; however, the payments will need to compensate for more downloads from any individual node. 

The experiments, conducted on the Harvard network with no Empty nodes, confirm this intuition. When 
(3 = 0, Figure|4](a) shows that the number of nodes serviced is smaller in Harvardr=5ms than in Harvardr=2ms- 
The reason is that the payments do not increase with the degree, so it is costlier for nodes in Harvardr=5ms 
to become active. As j3 increases, and high degrees result in higher compensation, more nodes are serviced 
in Harvardr=5ms- With j3 > 0, payments increase in the node degree, and nodes in Harvardr=5ms receive 
more payments because of their higher average degree. Thus, more nodes arc activated, and as a result, 
more nodes can be serviced. 

The increased activation comes at a price, as seen in Figure |4] (b). The higher average degree in 
Harvardr=5ms, combined with the dependence of payments on the degrees, leads to somewhat higher pay- 
ments per active (or serviced) node. Thus, in the Demand model, the increased participation in denser 
networks is not only a result of network effects, but also of higher payments. 

Therefore, in order to investigate the effectiveness of density itself on the participation or service rate, we 
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Figure 3: Comparison of different payment schemes, Harvard network with 50% Empty nodes. Notice that 
for readability, the directions of axis labels for a,/3 is different in the two figures. 
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(a) Fraction of serviced nodes (b) Serviced nodes per unit of payment 

Figure 4: Comparison of different thresholds, Harvard network with no Empty nodes. 



make the following comparison. Fix the payment per active node for both Harvardr=5ms and Harvardr=2ms 
to an arbitrary number by choosing the appropriate payment schemes for each graph. For example, in order 
to get a payment of 23 per active node, a payment scheme for Harvardr=5ms would be a = 2.7 and /3 = 1 
and for Harvardr=2ms would be a — 1 and (3 = 1.5. ft turns out that the denser network (Harvardr=5ms) 
gives a significantly higher rate for both participation and service. For a payment of 23 units per active 
node, for instance, the fraction of participating nodes for Harvardr=5ms is 86% while the same fraction goes 
down to 39% in Harvardr=2ms- 

Based on the simulations, the following were our main observations: 

f . How different arc the predictions in sharing behavior between the Demand, the No-Network, and the 
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One-Hop models? Our results show a significant difference between the models in their prediction of 
sharing: while the fraction of sharing nodes is qualitatively similar, the predictions ignoring network 
effects can be off by about 15%-25%. This results in up to 10% depreciation in the number of serviced 
peers. 

2. How does the participation depend on the network topology and density? We observe that the denser 
the network, the higher the rate of participation, given fixed incentives. This holds across grid and 
realistic Internet topologies. 

3. How does the payment scheme affect the number of sharing and serviced nodes, and the price paid per 
node? Our experiments suggest that the payments n u for realistic topologies should be proportional to 
u's degree to give high overall participation at low cost. In other words, given a network topology, there 
exists a choice of parameters for payments proportional to node degrees that maximizes the overall 
"bang per buck" . We derive these parameters for each network topology experimentally. 

5 Conclusions 

There are several natural directions for future work. A very interesting question arises when taking payments 
by "reputation" or download priorities into account. While monetary payments can (in principle) be increased 
arbitrarily, reputation is inherently constant-sum: if some peers are recognized as outstanding sharers, then 
others will receive less recognition, and might find the reduced recognition not enough incentive to keep 
sharing. Similarly, download priorities come at the expense of other peers, and can thus not be arbitrarily 
increased for all members of the network. As a result, the process of sharing will not necessarily be monotone: 
peers may choose to stop sharing once too many other peers are active. A first question is then whether 
stable (equilibrium) states even exist. If so, it would be interesting what fraction of the peers will be sharing, 
what the social welfare is, and how these quantities will depend on the network structure. 

From a more practical viewpoint, it would be desirable to evaluate how accurately our model (or a 
variation thereof) captures the actual behavior of participants in a P2P system. This would likely be a 
difficult experiment to perform, as many of the parameters, such as file demands and latency, arc inherently 
transient, and in a realistic system, payments cannot be changed constantly to evaluate the impact of such 
changes. 

In the bigger picture, the network designer also has to be concerned about manipulation by peers. For 
instance, colluding peers could artificially inflate the perceived "degree" of a peer (by claiming a download 
preference), and thus the payments to that peer. A more thorough investigation of mechanisms taking these 
and other concerns into account is an exciting direction for future work. 

Finally, our work lies among various applications in economics for which there are positive or negative 
externalities among agents in a neighborhood. Our results suggest that in order to study different economic 
metrics such as revenue or social welfare, we should always consider the cascading effect of agents' strategies 
over the network. 
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A Hardness of Approximation under the Seed Set Model 

Here, we prove that finding a seed set S to (even approximately) maximize the eventual number of active 
nodes is hard under the Sharing Process. Let Best Seed be the optimization problem of finding the seed 
set S of at most k nodes that maximizes the total number of sharing nodes, given n servers u\, . . . ,u n and 
the corresponding parameters c u , d u , X u , X u T: u ,p VtU . (Notice that when all of the A„ are given, the process is 
deterministic.) 

Proposition 17 It is hard to approximate Best Seed within n 1_e for any e > unless P = NP. 

Proof. We reduce from the Vertex Cover problem. Recall that the Vertex Cover problem is for- 
mulated as follows: Given a graph G = (V,E), a vertex cover is a set S C V of nodes such that each edge 
e £ E has at least one endpoint in S. In the Vertex Cover decision problem, the input is a pair (G,k): 
the question is whether there is a vertex cover of size at most k. We assume without loss of generality that 
G contains no isolated vertices. 

Given an arbitrary Vertex Cover instance with N = \V\ nodes and M = \E\ > N/2 edges, we 
construct an instance of Best Seed as follows: For each node u £ V, we have a node w u . For each edge 
e £ E, we create two nodes x e ,x' e . Finally, setting r = 1/e, we create M r "bulk" nodes y±, . . . , yM r - We set 
Px' ,yi = 1 for all yi,x e . For all e, p x > , Xe = 1- Finally, whenever e is incident on u, we have p Xl! ,w u = 1- All 
other values of p are 0. 

We visualize the construction above in 4 layers. The "node layer" consists of all nodes w u for all u. The 
"primary layer" consists of all x e . The "secondary layer" consists of all x' e . Finally, the "bulk layer" consists 
of all 2/j. Next, we define payments and demands: 



A„7r„ — 



d v 



if v — w u for some u £ V (node layer) 

3.5 if v = x e for some e £ E (primary layer) 

if v — x' e for some e € E (secondary layer) 

M + 0.5 otherwise (bulk layer) 



if v = w u for some u £ V (node layer) 
if v — x e for some e £ E (primary layer) 
if v = x' e for some e £ E (secondary layer) 
otherwise (bulk layer) 



First, let T be a vertex cover of size at most k. Consider the effect of starting with the nodes w u , u £ T 
as a seed set. Because T is a vertex cover, each primary node x e now has an active node w u with p Xs ,w u = 1, 
so that its demand of 2 is split between itself and (at least) one node w u . Thus, upon activation, it would 

and 1 from itself, whereas its payment is 3.5. Hence, each primary node 
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will become active in the second round. Once the primary node x e is active, x' e will split its demand evenly 
between x e and all active bulk nodes. Hence, each bulk node j/j will see demand at most 1 from each x' e , for 
a total of M. Since its payment offer is larger, j/j will become active. Hence, all bulk nodes will be active by 
round 3, and the total number of active nodes is at least M r + M + k. 

Conversely, suppose that strictly more than M + N nodes are active. Because none of the secondary 
nodes ever become active (since they have a payment offer of 0), this means that at least one bulk node 
must be active. Let yi be the first bulk node to become active, breaking ties arbitrarily. Because no other 
bulk nodes are active at this time, j/j must see demand at least 1 from each secondary node x' e . And because 
its payment offer is only M + 0.5, this means that it cannot see demand 2 from any secondary node - 
otherwise, the total demand would exceed the payment. This means that for each secondary node x' e , the 
corresponding primary node x e must already be active. Without loss of generality, the seed set contained no 
primary nodes — otherwise, the node x e could be replaced by w u (where u is an endpoint of e), which would 
next activate x e . Thus, x e must have become activated at some point of the process, which can only happen 
when its total demand is smaller than its payment. Since at that point, only x e can serve the demand of 
x' e , this in turn means that x e 's own demand must be split between itself and one or more active nodes w u . 
Thus, if S is the set of initially active nodes in the node layer, then the corresponding vertices of G must 
form a vertex cover. 

In summary, if there is a vertex cover of size at most k, then there is a seed set of size at most k 
activating at least M r + M + k nodes, whereas otherwise, no seed set of size at most k can activate more 
than M + N < 3M nodes. Thus, no approximation better than Q,(M r ~ 1 ) is possible. Since the total number 
of nodes is n = M r + 2M + N < 2M r (for r large enough), this proves an approximation hardness of 
n{n 1 - 1 ' r ) = n{n 1 - t ) } unless P=NP. ■ 



20 



